Hackathon
Data Analysis on Electric Vehicle
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go
import re
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'
# loading the data set
df=pd.read_csv(r"C:\data\dataset(hackthon).csv")
Dataset Description
VIN (1-10)-Vehicle Identification Number of the vehicle mentioned in the dataset.
County-Name of the County from where the data is gathered.
City-Name of the Cities from where the data is gathered.
State-Name of the State from where the data is gathered.
Postal Code-The postal code from where the data is gathered.
Model Year-Manufacturing year of the model mentioned in the data set.
Make-Manufacturer of the vehicle.
Model-Model Name of the mentioned vehicle.
Electric Vehicle Type-Type of the vehicle present in the dataset.
Clean Alternative Fuel Vehicle (CAFV) Eligibility-Clean Alternative for the data present in this dataset.
Vehicle Location-logitude and latitude
Electric Range-range of kms travelled
# Sample data to understand the Data
df.head()
| VIN (1-10) | County | City | State | Postal Code | Model Year | Make | Model | Electric Vehicle Type | Clean Alternative Fuel Vehicle (CAFV) Eligibility | Electric Range | Base MSRP | Legislative District | DOL Vehicle ID | Vehicle Location | Electric Utility | 2020 Census Tract | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | JTMEB3FV6N | Monroe | Key West | FL | 33040 | 2022 | TOYOTA | RAV4 PRIME | Plug-in Hybrid Electric Vehicle (PHEV) | Clean Alternative Fuel Vehicle Eligible | 42 | 0 | NaN | 198968248 | POINT (-81.80023 24.5545) | NaN | 12087972100 |
| 1 | 1G1RD6E45D | Clark | Laughlin | NV | 89029 | 2013 | CHEVROLET | VOLT | Plug-in Hybrid Electric Vehicle (PHEV) | Clean Alternative Fuel Vehicle Eligible | 38 | 0 | NaN | 5204412 | POINT (-114.57245 35.16815) | NaN | 32003005702 |
| 2 | JN1AZ0CP8B | Yakima | Yakima | WA | 98901 | 2011 | NISSAN | LEAF | Battery Electric Vehicle (BEV) | Clean Alternative Fuel Vehicle Eligible | 73 | 0 | 15.0 | 218972519 | POINT (-120.50721 46.60448) | PACIFICORP | 53077001602 |
| 3 | 1G1FW6S08H | Skagit | Concrete | WA | 98237 | 2017 | CHEVROLET | BOLT EV | Battery Electric Vehicle (BEV) | Clean Alternative Fuel Vehicle Eligible | 238 | 0 | 39.0 | 186750406 | POINT (-121.7515 48.53892) | PUGET SOUND ENERGY INC | 53057951101 |
| 4 | 3FA6P0SU1K | Snohomish | Everett | WA | 98201 | 2019 | FORD | FUSION | Plug-in Hybrid Electric Vehicle (PHEV) | Not eligible due to low battery range | 26 | 0 | 38.0 | 2006714 | POINT (-122.20596 47.97659) | PUGET SOUND ENERGY INC | 53061041500 |
# columns in dataframe
df.columns
Index(['VIN (1-10)', 'County', 'City', 'State', 'Postal Code', 'Model Year',
'Make', 'Model', 'Electric Vehicle Type',
'Clean Alternative Fuel Vehicle (CAFV) Eligibility', 'Electric Range',
'Base MSRP', 'Legislative District', 'DOL Vehicle ID',
'Vehicle Location', 'Electric Utility', '2020 Census Tract'],
dtype='object')
# Info of the data
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 112634 entries, 0 to 112633 Data columns (total 17 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 VIN (1-10) 112634 non-null object 1 County 112634 non-null object 2 City 112634 non-null object 3 State 112634 non-null object 4 Postal Code 112634 non-null int64 5 Model Year 112634 non-null int64 6 Make 112634 non-null object 7 Model 112614 non-null object 8 Electric Vehicle Type 112634 non-null object 9 Clean Alternative Fuel Vehicle (CAFV) Eligibility 112634 non-null object 10 Electric Range 112634 non-null int64 11 Base MSRP 112634 non-null int64 12 Legislative District 112348 non-null float64 13 DOL Vehicle ID 112634 non-null int64 14 Vehicle Location 112610 non-null object 15 Electric Utility 112191 non-null object 16 2020 Census Tract 112634 non-null int64 dtypes: float64(1), int64(6), object(10) memory usage: 14.6+ MB
# checking null values in data
df.isnull().sum()
VIN (1-10) 0 County 0 City 0 State 0 Postal Code 0 Model Year 0 Make 0 Model 20 Electric Vehicle Type 0 Clean Alternative Fuel Vehicle (CAFV) Eligibility 0 Electric Range 0 Base MSRP 0 Legislative District 286 DOL Vehicle ID 0 Vehicle Location 24 Electric Utility 443 2020 Census Tract 0 dtype: int64
df.duplicated().sum()
0
df
| VIN (1-10) | County | City | State | Postal Code | Model Year | Make | Model | Electric Vehicle Type | Clean Alternative Fuel Vehicle (CAFV) Eligibility | Electric Range | Base MSRP | Legislative District | DOL Vehicle ID | Vehicle Location | Electric Utility | 2020 Census Tract | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | JTMEB3FV6N | Monroe | Key West | FL | 33040 | 2022 | TOYOTA | RAV4 PRIME | Plug-in Hybrid Electric Vehicle (PHEV) | Clean Alternative Fuel Vehicle Eligible | 42 | 0 | NaN | 198968248 | POINT (-81.80023 24.5545) | NaN | 12087972100 |
| 1 | 1G1RD6E45D | Clark | Laughlin | NV | 89029 | 2013 | CHEVROLET | VOLT | Plug-in Hybrid Electric Vehicle (PHEV) | Clean Alternative Fuel Vehicle Eligible | 38 | 0 | NaN | 5204412 | POINT (-114.57245 35.16815) | NaN | 32003005702 |
| 2 | JN1AZ0CP8B | Yakima | Yakima | WA | 98901 | 2011 | NISSAN | LEAF | Battery Electric Vehicle (BEV) | Clean Alternative Fuel Vehicle Eligible | 73 | 0 | 15.0 | 218972519 | POINT (-120.50721 46.60448) | PACIFICORP | 53077001602 |
| 3 | 1G1FW6S08H | Skagit | Concrete | WA | 98237 | 2017 | CHEVROLET | BOLT EV | Battery Electric Vehicle (BEV) | Clean Alternative Fuel Vehicle Eligible | 238 | 0 | 39.0 | 186750406 | POINT (-121.7515 48.53892) | PUGET SOUND ENERGY INC | 53057951101 |
| 4 | 3FA6P0SU1K | Snohomish | Everett | WA | 98201 | 2019 | FORD | FUSION | Plug-in Hybrid Electric Vehicle (PHEV) | Not eligible due to low battery range | 26 | 0 | 38.0 | 2006714 | POINT (-122.20596 47.97659) | PUGET SOUND ENERGY INC | 53061041500 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 112629 | 7SAYGDEF2N | King | Duvall | WA | 98019 | 2022 | TESLA | MODEL Y | Battery Electric Vehicle (BEV) | Eligibility unknown as battery range has not b... | 0 | 0 | 45.0 | 217955265 | POINT (-121.98609 47.74068) | PUGET SOUND ENERGY INC||CITY OF TACOMA - (WA) | 53033032401 |
| 112630 | 1N4BZ1CP7K | San Juan | Friday Harbor | WA | 98250 | 2019 | NISSAN | LEAF | Battery Electric Vehicle (BEV) | Clean Alternative Fuel Vehicle Eligible | 150 | 0 | 40.0 | 103663227 | POINT (-123.01648 48.53448) | BONNEVILLE POWER ADMINISTRATION||ORCAS POWER &... | 53055960301 |
| 112631 | 1FMCU0KZ4N | King | Vashon | WA | 98070 | 2022 | FORD | ESCAPE | Plug-in Hybrid Electric Vehicle (PHEV) | Clean Alternative Fuel Vehicle Eligible | 38 | 0 | 34.0 | 193878387 | POINT (-122.4573 47.44929) | PUGET SOUND ENERGY INC||CITY OF TACOMA - (WA) | 53033027702 |
| 112632 | KNDCD3LD4J | King | Covington | WA | 98042 | 2018 | KIA | NIRO | Plug-in Hybrid Electric Vehicle (PHEV) | Not eligible due to low battery range | 26 | 0 | 47.0 | 125039043 | POINT (-122.09124 47.33778) | PUGET SOUND ENERGY INC||CITY OF TACOMA - (WA) | 53033032007 |
| 112633 | YV4BR0CL8N | King | Covington | WA | 98042 | 2022 | VOLVO | XC90 | Plug-in Hybrid Electric Vehicle (PHEV) | Not eligible due to low battery range | 18 | 0 | 47.0 | 194673692 | POINT (-122.09124 47.33778) | PUGET SOUND ENERGY INC||CITY OF TACOMA - (WA) | 53033032005 |
112634 rows × 17 columns
df['Electric Utility'] = df['Electric Utility'].fillna('Utility Not Avalilable')
df['Legislative District'] = df['Legislative District'].fillna('Unknown')
df['Vehicle Location'] = df['Vehicle Location'].fillna('Unknown')
df['Model'] = df['Model'].fillna('Unknown')
df['2020 Census Tract'] = df['2020 Census Tract'].fillna('Unknown')
df['City'] = df['City'].fillna('Unknown')
df['Postal Code'] = df['Postal Code'].astype(int)
df.shape
(112634, 17)
df.isna().sum()
VIN (1-10) 0 County 0 City 0 State 0 Postal Code 0 Model Year 0 Make 0 Model 0 Electric Vehicle Type 0 Clean Alternative Fuel Vehicle (CAFV) Eligibility 0 Electric Range 0 Base MSRP 0 Legislative District 0 DOL Vehicle ID 0 Vehicle Location 0 Electric Utility 0 2020 Census Tract 0 dtype: int64
# Checking the shape
df.shape
(112634, 17)
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 112634 entries, 0 to 112633 Data columns (total 17 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 VIN (1-10) 112634 non-null object 1 County 112634 non-null object 2 City 112634 non-null object 3 State 112634 non-null object 4 Postal Code 112634 non-null int32 5 Model Year 112634 non-null int64 6 Make 112634 non-null object 7 Model 112634 non-null object 8 Electric Vehicle Type 112634 non-null object 9 Clean Alternative Fuel Vehicle (CAFV) Eligibility 112634 non-null object 10 Electric Range 112634 non-null int64 11 Base MSRP 112634 non-null int64 12 Legislative District 112634 non-null object 13 DOL Vehicle ID 112634 non-null int64 14 Vehicle Location 112634 non-null object 15 Electric Utility 112634 non-null object 16 2020 Census Tract 112634 non-null int64 dtypes: int32(1), int64(5), object(11) memory usage: 14.2+ MB
df.drop(['Postal Code','Base MSRP','Legislative District','DOL Vehicle ID','Electric Utility','2020 Census Tract'],axis=1,inplace=True)
Task - 1
company_counts = df.groupby('Make').count().sort_values(by='City', ascending=False)['City'].reset_index()
top_10 = company_counts[:10]
# Create the bar chart
fig = px.bar(top_10, x='Make', y='City', labels={'Make': 'Companies', 'City': 'Count'},
title='Top 10 Electric Vehicle Companies by Number of Cities', color='City',
color_continuous_scale='Viridis')
# Show the plot
fig.show()
This plot shows the top 10 electric vehicle companies by number of cities in which they have registered vehicles.
Tesla is the clear leader in the electric vehicle market, with registrations in most of the cities.
Nissan is the second-most popular electric vehicle company.
Chevroletis the third-most popular electric vehicle company.
Ford is the fourth-most popular electric vehicle company.
BMW is the fifth-most popular electric vehicle company.
Counties = df.groupby('County').count().sort_values(by='City',ascending=False)['City'].index
values = df.groupby('County').count().sort_values(by='City',ascending=False)['City'].values
px.bar(x=list(Counties)[:10],y=values[:10],labels={'x':"County Name",'y':"Number of Cars"},color=values[:10])
Top 10 vechicle Models
Companies = df.groupby('Make').count().sort_values(by='City',ascending=False)['City'].index
values = df.groupby('Make').count().sort_values(by='City',ascending=False)['City'].values
top_n = 10
top_companies = company_counts[:top_n].reset_index()
fig = px.bar(top_companies, x='Make', y='City', labels={'Make': 'Companies', 'City': 'Count'},
title='Top Companies Producing Electric Vehicles', color='City',
color_continuous_scale='Viridis')
fig.update_layout(xaxis_tickangle=-45)
fig.show()
px.pie(names=list(Companies)[:10],values=values[:10],width=500,height=400)
This plot shows the top 10 electric vehicle models by number of cities in which they are registered. The insights that we can draw from this chart are:
The Tesla Model 3 is the most popular electric vehicle model, with registrations in over most of cities.
The Tesla Model Y is the second most popular electric vehicle model.
The Nissan LEAF is the third most popular electric vehicle model.
The Tesla Model S and Model X are also popular models.
what are the most sold models per each company
# Get the top 10 models by number of cities
model_counts = df.groupby('Model').count().sort_values(by='City', ascending=False)['City'].reset_index()
top_10 = model_counts[:10]
# Create the bar chart
fig = px.bar(top_10, x='Model', y='City', labels={'Model': 'Models', 'City': 'Count'},
title='Top 10 Electric Vehicle Models by Number of Cities', color='City',
color_continuous_scale='Viridis')
# Show the plot
fig.show()
top_10_companies = list(Companies)[:10]
for i in top_10_companies:
data = df[df['Make']==i]
data = data.groupby('Model').count().sort_values(by='City',ascending=False).index
print('Top selling model for',i,'is ----------->',data[0])
Top selling model for TESLA is -----------> MODEL 3 Top selling model for NISSAN is -----------> LEAF Top selling model for CHEVROLET is -----------> BOLT EV Top selling model for FORD is -----------> FUSION Top selling model for BMW is -----------> I3 Top selling model for KIA is -----------> NIRO Top selling model for TOYOTA is -----------> PRIUS PRIME Top selling model for VOLKSWAGEN is -----------> ID.4 Top selling model for AUDI is -----------> E-TRON Top selling model for VOLVO is -----------> XC90
#Percentage of BEV vs PHEV
Vehicle_type = list(df.groupby('Electric Vehicle Type').count()['County'].index)
values = df.groupby('Electric Vehicle Type').count()['County'].values
px.pie(names=Vehicle_type,values=values,height=400)
Majority of the vehicles are Battery Electric Vehicles(BEV)
# whats the percentage of top 10 companies vehicles are BEV and PHEV
for index,i in enumerate(top_10_companies):
data = df[df['Make']==i]
labels = list(data.groupby('Electric Vehicle Type').count()['City'].index)
values = list(data.groupby('Electric Vehicle Type').count()['City'].values)
fig = px.pie(names=labels,values=values,width=700,height=400,title=str(i))
fig.show()
This plot shows the distribution of electric vehicle types for each of the top 10 companies:
Tesla ,Nissan,volkswagen are producing majority battery electric vehicles (BEVs)
other companies are producing plug-in hybrid electric vehicles (PHEVs).
year_wise_cars = df.groupby('Model Year')['VIN (1-10)'].count().reset_index()
year_wise_cars.columns = ['year','num_cars']
fig = px.line(year_wise_cars,x="year", y="num_cars", title='Year Wise Number of Cars',markers=True)
fig.show()
year_wise_cars.sort_values(by='num_cars', ascending=False).head(10)
| year | num_cars | |
|---|---|---|
| 18 | 2022 | 26530 |
| 17 | 2021 | 18364 |
| 14 | 2018 | 14246 |
| 16 | 2020 | 11038 |
| 15 | 2019 | 10266 |
| 13 | 2017 | 8644 |
| 12 | 2016 | 5735 |
| 11 | 2015 | 4940 |
| 9 | 2013 | 4691 |
| 10 | 2014 | 3685 |
The line chart you provided shows the number of electric vehicles registered in the United States by year. The insights that we can draw from this chart are:
The number of electric vehicles registered in the United States has been increasing steadily in recent years.
The number of electric vehicles registered in the United States has continued to grow in recent years.
The chart also shows that there is a wide range of years with different numbers of electric vehicles registered.
car_counts_St = df['State'].value_counts().nlargest(10)
fig = px.bar(car_counts_St, x=car_counts_St.index, y=car_counts_St.values,
labels={'x': 'State', 'y': 'Number of Cars (log scale)'},
title='Top 10 Count of Cars per State',
template='plotly_dark')
fig.update_layout(yaxis_type='log')
fig.update_traces(marker_color='steelblue')
fig.show()
car_counts_St_df = car_counts_St.to_frame()
car_counts_St_df.style.background_gradient(cmap='Blues')
| State | |
|---|---|
| WA | 112348 |
| CA | 76 |
| VA | 36 |
| MD | 26 |
| TX | 14 |
| CO | 9 |
| NV | 8 |
| GA | 7 |
| NC | 7 |
| CT | 6 |
WA state has more number of cars
cnt_MkCity = df.groupby(['City', 'Make']).size().reset_index(name='Count')
# Group the data by city and make, and sum the counts for each group
grouped_data_cty = cnt_MkCity.groupby(['City', 'Make'])['Count'].sum().reset_index()
# Group the data by city and sum the counts for each city and make
city_counts = grouped_data_cty.groupby('City')['Count'].sum().reset_index()
make_counts = grouped_data_cty.groupby('Make')['Count'].sum().reset_index()
# Sort the cities by count in descending order, and select the top 10
top_cities = city_counts.sort_values(by='Count', ascending=False).head(10)
top_makes = make_counts.sort_values(by='Count', ascending=False).head(10)
# Filter the data to only include the top 10 cities and top 10 makes
filtered_data_Cty = grouped_data_cty[
grouped_data_cty['City'].isin(top_cities['City']) & grouped_data_cty['Make'].isin(top_makes['Make'])
]
pivoted_data_cty = filtered_data_Cty.pivot(index='City', columns='Make', values='Count').fillna(0)
fig = go.Figure()
for make in top_makes['Make']:
fig.add_trace(go.Bar(name=make, x=pivoted_data_cty.index, y=pivoted_data_cty[make]))
fig.update_layout(title='Top 10 Make distribution count per top 10 City',
xaxis_title='City',
yaxis_title='Number of Cars')
fig.show()
pivoted_data_cty.head()
| Make | AUDI | BMW | CHEVROLET | FORD | KIA | NISSAN | TESLA | TOYOTA | VOLKSWAGEN | VOLVO |
|---|---|---|---|---|---|---|---|---|---|---|
| City | ||||||||||
| Bellevue | 120 | 295 | 211 | 131 | 131 | 527 | 3714 | 140 | 76 | 103 |
| Bothell | 48 | 119 | 183 | 128 | 98 | 374 | 1950 | 67 | 48 | 57 |
| Kirkland | 97 | 174 | 173 | 92 | 117 | 316 | 2112 | 70 | 70 | 91 |
| Olympia | 48 | 70 | 456 | 215 | 177 | 360 | 805 | 170 | 62 | 39 |
| Redmond | 70 | 168 | 189 | 110 | 112 | 460 | 2570 | 101 | 77 | 66 |
The stacked bar plot you provided shows the distribution of electric vehicles by make in the top 10 cities in the United States. The insights that we can draw from this plot are:
Tesla is the most popular make in the top 10 cities, followed by Nissan and Chevrolet.
The plot also shows that there is a wide range of makes represented in the top 10 cities. This suggests that there is a growing demand for electric vehicles from a variety of manufacturers.
import pandas as pd
import plotly.graph_objects as go
# Calculate the counts of cars for each state and make combination
cnt_Mk_St = df.groupby(['State', 'Make']).size().reset_index(name='Count')
# Group the data by state and make, and sum the counts for each group
grouped_data_St = cnt_Mk_St.groupby(['State', 'Make'])['Count'].sum().reset_index()
# Group the data by state and sum the counts for each state and make
st_counts = grouped_data_St.groupby('State')['Count'].sum().reset_index()
make_counts = grouped_data_St.groupby('Make')['Count'].sum().reset_index()
# Sort the states by count in descending order, and select the top 10
top_States = st_counts.sort_values(by='Count', ascending=False).head(10)
top_makes = make_counts.sort_values(by='Count', ascending=False).head(10)
# Filter the data to only include the top 10 states and top 10 makes
filtered_data_St = grouped_data_St[
grouped_data_St['State'].isin(top_States['State']) & grouped_data_St['Make'].isin(top_makes['Make'])
]
pivoted_data_St = filtered_data_St.pivot(index='State', columns='Make', values='Count').fillna(0)
fig = go.Figure()
for make in top_makes['Make']:
fig.add_trace(go.Bar(name=make, x=pivoted_data_St.index, y=pivoted_data_St[make]))
fig.update_layout(title='Top 10 Make distribution count per top 10 State',
xaxis_title='State',
yaxis_title='Number of Cars',
yaxis_type='log') # Set y-axis to logarithmic scale
fig.show()
pivoted_data_St.head(10)
| Make | AUDI | BMW | CHEVROLET | FORD | KIA | NISSAN | TESLA | TOYOTA | VOLKSWAGEN | VOLVO |
|---|---|---|---|---|---|---|---|---|---|---|
| State | ||||||||||
| AZ | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 3.0 | 1.0 | 0.0 | 1.0 |
| CA | 2.0 | 1.0 | 3.0 | 7.0 | 1.0 | 2.0 | 40.0 | 6.0 | 2.0 | 4.0 |
| CO | 0.0 | 1.0 | 2.0 | 0.0 | 0.0 | 0.0 | 4.0 | 0.0 | 0.0 | 0.0 |
| GA | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 1.0 | 1.0 |
| MD | 1.0 | 1.0 | 2.0 | 2.0 | 0.0 | 1.0 | 10.0 | 2.0 | 0.0 | 1.0 |
| NC | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | 3.0 | 0.0 | 0.0 | 0.0 |
| NV | 0.0 | 0.0 | 2.0 | 1.0 | 0.0 | 0.0 | 5.0 | 0.0 | 0.0 | 0.0 |
| TX | 0.0 | 0.0 | 0.0 | 4.0 | 0.0 | 0.0 | 9.0 | 1.0 | 0.0 | 0.0 |
| VA | 0.0 | 3.0 | 4.0 | 2.0 | 1.0 | 2.0 | 17.0 | 3.0 | 1.0 | 0.0 |
| WA | 2327.0 | 4665.0 | 10162.0 | 5795.0 | 4476.0 | 12866.0 | 51944.0 | 4384.0 | 2509.0 | 2281.0 |
fig = px.histogram(df, x='Electric Range', color='Electric Vehicle Type',
nbins=30, barmode='overlay', histfunc='count',
labels={'Electric Range': 'Electric Range', 'Electric Vehicle Type': 'Vehicle Type'},
title='Electric Vehicle Range Distribution by Vehicle Type')
# Step 3: Show the plot
fig.show()
The histogram you provided shows the distribution of electric vehicle ranges by vehicle type. The insights that we can draw from this plot are:
There is a wide range of electric vehicle ranges, from around 100 miles to over 300 miles.
Battery electric vehicles (BEVs) have a wider range than plug-in hybrid electric vehicles (PHEVs).
The average range for BEVs is around 200 miles, while the average range for PHEVs is around 50 miles.
There are a few BEVs with ranges of over 300 miles, but most BEVs have ranges of less than 250 miles.
Task - 2
#Number of models of company for each year for last 10 years
data = df.copy()
data['top_10'] = data['Make'].apply(lambda x: 1 if x in top_10_companies else 0)
data = data[data['top_10'] == 1]
data = data[data['Model Year'] >= 2011]
# Create the Count plot using Plotly
fig = px.histogram(data, x='Model Year', color='Make', barmode='group', labels={'Model Year': 'Model Year', 'Make': 'Manufacturer'},
title='Model Year Distribution for Top 10 Companies (Since 2011)',
template='ggplot2')
# Show the plot
fig.show()
import re
Location_data = df.groupby('Vehicle Location').count()['County'].reset_index()
Location_data.rename(columns={'Vehicle Location': 'Locations', 'County': 'Count'}, inplace=True)
# Extract latitude and longitude from 'Locations'
def extract_latitude(location):
try:
latitude = re.findall(r'[-+]?\d*\.\d+|\d+', location.split('(')[-1])
return float(latitude[0])
except:
return None
def extract_longitude(location):
try:
longitude = re.findall(r'[-+]?\d*\.\d+|\d+', location.split('(')[-1])
return float(longitude[1])
except:
return None
Location_data['Latitude'] = Location_data['Locations'].apply(extract_latitude)
Location_data['Longitude'] = Location_data['Locations'].apply(extract_longitude)
Location_data.dropna(subset=['Latitude', 'Longitude'], inplace=True)
fig = px.scatter(Location_data, x=Location_data['Latitude'],y= Location_data['Longitude'], size='Count', color='Count',
labels={'Latitude': 'Latitude', 'Longitude': 'Longitude', 'Count': 'Count'},
title='Vehicle Locations and Counts',
hover_data=['Locations', 'Count'])
fig.update_layout(xaxis_range=[-130, -60], yaxis_range=[20, 60])
fig.show()
The scatter plot provided shows the locations of electric vehicles in the United States, with the size of the points representing the number of vehicles at that location. The insights that we can draw from this plot are:
There are a number of clusters of electric vehicles, particularly in the northeastern United States, and the Pacific Northwest. There are fewer electric vehicles in the southern and central United States.
The size of the points shows that there is a wide variation in the number of electric vehicles at different locations.
The plot also shows that there is a positive correlation between the number of electric vehicles and population density, meaning that there are more electric vehicles in areas with more people.
df_copy = df.copy()
# Extract latitude and longitude from 'Vehicle Location'
def extract_latitude(location):
try:
latitude = re.findall(r'[-+]?\d*\.\d+|\d+', location.split('(')[-1])
return float(latitude[0])
except:
return None
def extract_longitude(location):
try:
longitude = re.findall(r'[-+]?\d*\.\d+|\d+', location.split('(')[-1])
return float(longitude[1])
except:
return None
df_copy['Lattitude'] = df_copy['Vehicle Location'].apply(extract_latitude)
df_copy['Longitude'] = df_copy['Vehicle Location'].apply(extract_longitude)
df_copy.dropna(subset=['Lattitude', 'Longitude'], inplace=True)
fig = px.scatter(df_copy, x='Lattitude', y='Longitude', color='Clean Alternative Fuel Vehicle (CAFV) Eligibility',
labels={'Lattitude': 'Latitude', 'Longitude': 'Longitude',
'Clean Alternative Fuel Vehicle (CAFV) Eligibility': 'CAFV Eligibility'},
title='Scatter Plot of Latitude and Longitude')
fig.update_layout(xaxis_range=[-130, -60], yaxis_range=[20, 50])
fig.show()
The scatter plot you provided shows the distribution of electric vehicles by latitude and longitude. The insights that we can draw from this plot are:
There is a clear cluster of electric vehicles in the northwestren United States.
There are also clusters of electric vehicles in the northeastern United States.
There are fewer electric vehicles in the southern and central United States.
There is a positive correlation between latitude and CAFV eligibility, meaning that there are more electric vehicles eligible for CAFV rebates in the northern states.
fig = px.scatter(df_copy, x='Lattitude', y='Longitude', color='Electric Vehicle Type',
labels={'Lattitude': 'Latitude', 'Longitude': 'Longitude',
'Electric Vehicle Type': 'Electric Vehicle Type'},
title='Scatter Plot of Latitude and Longitude')
# Step 5: Set the plot limits for Latitude and Longitude
fig.update_layout(xaxis_range=[-130, -60], yaxis_range=[20, 50])
# Step 6: Show the plot
fig.show()
states = list(df.groupby('State').count().sort_values(by='City',ascending=False)['City'].index)
values = df.groupby('State').count().sort_values(by='City',ascending=False)['City'].values
data = pd.DataFrame(df.groupby('State').count().sort_values(by='City',ascending=False)['City'])
data = data.reset_index()
data.columns = ['State','Count']
fig = px.choropleth(data,
locations='State',
locationmode="USA-states",
color='Count',
color_continuous_scale="blues",
scope="usa")
fig.show()
Task - 3
# Group the data by 'Model Year' and 'Make', and calculate the count for each group
ev_make_count_by_year = df.groupby(['Model Year', 'Make']).size().reset_index(name='Count')
# Ensure all combinations of 'Model Year' and 'Make' with zero counts are included
all_model_years = df['Model Year'].unique()
all_makes = df['Make'].unique()
all_combinations = pd.MultiIndex.from_product([all_model_years, all_makes], names=['Model Year', 'Make'])
all_combinations_df = pd.DataFrame(index=all_combinations).reset_index()
ev_make_count_by_year = pd.merge(all_combinations_df, ev_make_count_by_year, on=['Model Year', 'Make'], how='left')
ev_make_count_by_year['Count'].fillna(0, inplace=True)
# Create the Racing Bar Plot using Plotly
fig = px.bar(ev_make_count_by_year,
x='Count',
y='Make',
animation_frame='Model Year',
color='Make',
labels={'Make': 'EV Make', 'Count': 'Count'},
title='EV Maker and Count Each Year'
)
# Customize the layout
fig.update_layout(
xaxis_title='Count',
yaxis_title='EV Make',
yaxis={'categoryorder': 'total ascending'}
)
fig.show()
Conclusion
The electric vehicle market is growing rapidly.
The electric vehicle market is likely to continue to grow in the coming years, as the demand for electric vehicles increases.
TESLA is a Leading in Electric Vechile Manufacturer .
Majority of the vehicles are Battery Electric Vehicles(BEV) and Tesla is producing Battery Electric Vehicles(BEV)
There is a positive correlation between the number of electric vehicles and population density, meaning that there are more electric vehicles in areas with more people.
There are more BEVs in the northwestren states.